# Parsing Non-standard Data Formats in EEGUnity This tutorial covers non-standard file types supported by EEGUnity parser extensions. ## Supported Formats EEGUnity can parse these non-standard sources during dataset scanning: - MATLAB files: `.mat` - HDF5 EEGLAB files: `.set` (stored as MATLAB v7.3/HDF5) - CSV or TXT time-series tables: `.csv`, `.txt` - WFDB records: `.hea` + `.dat` - EDF content with non-standard extension: `.rec` - BrainVision `.vhdr` with broken internal sidecar references (automatic patch fallback) ## Step 1: Build Locator with Parser Extensions Enabled ```python from eegunity import UnifiedDataset ud = UnifiedDataset( dataset_path=r"path/to/dataset", domain_tag="my_dataset", num_workers=8, min_file_size=0, # include small CSV/TXT files ) locator = ud.get_locator() print(locator[["File Path", "File Type", "Completeness Check"]].head()) print(locator["File Type"].value_counts(dropna=False)) ``` ## Step 2: Inspect Specific File Types ```python wfdb_rows = locator[locator["File Type"] == "wfdbData"] csv_rows = locator[locator["File Type"] == "csvData"] hdf5_set_rows = locator[locator["File Type"] == "eeglab_hdf5"] print("WFDB rows:", len(wfdb_rows)) print("CSV/TXT rows:", len(csv_rows)) print("HDF5 .set rows:", len(hdf5_set_rows)) ``` ## Step 3: Load a Non-standard Row with `get_data_row` ```python from eegunity import get_data_row # Example: read the first available WFDB row row = wfdb_rows.iloc[0] raw = get_data_row(row, preload=False) print("Channels:", raw.info["nchan"]) print("Sampling rate:", raw.info["sfreq"]) ``` The same `get_data_row` API works for `.mat`, `csvData`, `eeglab_hdf5`, and `.rec` rows. ## Step 4: Batch Validate Readability ```python def can_read(row): try: _ = get_data_row(row, preload=False) return "ok" except Exception as exc: return f"error: {type(exc).__name__}" status = ud.eeg_batch.batch_process( con_func=lambda row: row["Completeness Check"] != "Unavailable", app_func=can_read, is_patch=True, result_type="value", execution_mode="thread", ) ud.eeg_batch.set_metadata("Read Check", status) print(ud.get_locator()[["File Path", "File Type", "Read Check"]].head()) ``` ## Notes - For WFDB parsing, install `wfdb`. - For HDF5 `.set`, install `h5py`. - `min_file_size` mainly affects CSV/TXT scanning; set it to `0` when testing small demo files. - BrainVision `.vhdr` sidecar mismatch is retried automatically with patched temporary headers.